Profitability Analysis of a Bicycle Company by Ruby Villacorta

The data contains sales transactions of a bicycle company called GBI. The bycicle company is managed by SAP ERP. The data is obtained through the SAP University Alliance for study purposes.

Univariate Plots Section

##  [1] "Country"                          "Year"                            
##  [3] "Quarter"                          "Month"                           
##  [5] "Day"                              "Quote.Date"                      
##  [7] "Quote.Number"                     "Sales.Order.Create.Date"         
##  [9] "Sales.Order.Number"               "Post.Goods.Issue.Date"           
## [11] "Delivery.Number"                  "Billing.Date"                    
## [13] "Billing.Document.Number"          "Payment.Receipt.Date"            
## [15] "Accounting.Document.Number"       "Customer"                        
## [17] "Customer.Name"                    "City"                            
## [19] "Material.Master.Description"      "Material.Group.Description"      
## [21] "Sales.Org.Description"            "Division.Description"            
## [23] "Distribution.Channel"             "Distribution.Channel.Description"
## [25] "Unit.of.Measure"                  "Price.per.Unit.EUR"              
## [27] "Unit.Cost.at.Goods.Issue.EUR"     "Discount.EUR"                    
## [29] "Quantity"                         "Revenue.EUR"                     
## [31] "Cost.of.Goods.Sold.EUR"           "Profit.Margin.EUR"

Displaying the names of variable in order to determine the ones to investigate.

##       Germany United States 
##         11347         36153

This shows the countries where GBI operates.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    2006    2007    2010    2009    2012    2013

GBI sales transactions are from 2006 to 2013.

Now, I create a new variable denoting the profit margin in percent. Profit Margin % = Profit Margin / Revenue.

## 'data.frame':    47500 obs. of  33 variables:
##  $ Country                         : Factor w/ 2 levels "Germany","United States": 2 2 2 2 2 2 2 2 1 1 ...
##  $ Year                            : int  2006 2006 2006 2006 2006 2006 2006 2006 2006 2006 ...
##  $ Quarter                         : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Month                           : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Day                             : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Quote.Date                      : Factor w/ 2692 levels "1/1/06","1/1/07",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Quote.Number                    : int  20000000 20000001 20000001 20000001 20000001 20000001 20000001 20000003 20000004 20000005 ...
##  $ Sales.Order.Create.Date         : Factor w/ 2711 levels "1/1/07","1/1/08",..: 206 206 206 206 206 206 206 206 206 206 ...
##  $ Sales.Order.Number              : int  2 1 1 1 1 1 1 3 4 5 ...
##  $ Post.Goods.Issue.Date           : Factor w/ 2701 levels "","1/1/07","1/1/08",..: 940 1022 1022 1022 1022 1022 1022 940 1054 1022 ...
##  $ Delivery.Number                 : int  80000001 80000012 80000012 80000012 80000012 80000012 80000012 80000005 80000025 80000006 ...
##  $ Billing.Date                    : Factor w/ 2699 levels "","1/1/07","1/1/08",..: 1027 1027 1027 1027 1027 1027 1027 1019 1067 1019 ...
##  $ Billing.Document.Number         : int  90000011 90000007 90000007 90000007 90000007 90000007 90000007 90000004 90000026 90000005 ...
##  $ Payment.Receipt.Date            : Factor w/ 2685 levels "","1/1/07","1/1/08",..: 886 870 870 870 870 870 870 855 916 855 ...
##  $ Accounting.Document.Number      : int  1400000014 1400000006 1400000006 1400000006 1400000006 1400000006 1400000006 1400000003 1400000008 1400000002 ...
##  $ Customer                        : int  1000 7000 7000 7000 7000 7000 7000 4000 18000 13000 ...
##  $ Customer.Name                   : Factor w/ 24 levels "Airport Bikes",..: 20 11 11 11 11 11 11 16 9 1 ...
##  $ City                            : Factor w/ 23 levels "Anklam","Atlanta",..: 7 10 10 10 10 10 10 2 15 9 ...
##  $ Material.Master.Description     : Factor w/ 18 levels "Air Pump","Deluxe Touring Bike (black)",..: 13 5 12 7 4 16 2 13 13 13 ...
##  $ Material.Group.Description      : Factor w/ 2 levels "Finished Bikes",..: 2 2 1 2 1 2 1 2 2 2 ...
##  $ Sales.Org.Description           : Factor w/ 4 levels "Germany North",..: 4 3 3 3 3 3 3 3 1 2 ...
##  $ Division.Description            : Factor w/ 2 levels "Accessories",..: 1 1 2 1 2 1 2 1 1 1 ...
##  $ Distribution.Channel            : Factor w/ 1 level "WH": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Distribution.Channel.Description: Factor w/ 1 level "Wholesale": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Unit.of.Measure                 : Factor w/ 1 level "EA": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Price.per.Unit.EUR              : num  22 51.7 2208.2 51.7 2072.6 ...
##  $ Unit.Cost.at.Goods.Issue.EUR    : num  11.7 27.6 1085 14.4 1013 ...
##  $ Discount.EUR                    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ Quantity                        : int  11 11 8 15 7 12 6 11 7 6 ...
##  $ Revenue.EUR                     : num  242 569 17666 776 14508 ...
##  $ Cost.of.Goods.Sold.EUR          : num  128 303 8680 216 7091 ...
##  $ Profit.Margin.EUR               : num  114 266 8985 559 7417 ...
##  $ Profit.Margin.pc                : num  0.471 0.467 0.509 0.721 0.511 ...

This displays the type of variables available and their format.

Profit margin = Revenue - COGS The plot shows the counts of the profit margin of material sales. It shows that the majority of profit margin of material sales are low. This need to be further investigated.

The distribution is similar to above (not surprise).

GBI sales transations are higher in United States than Germany.

GBI has two divisions: Accessories and Bicycles. Most of low profit margins come from accessories. One can observed a normal distribution of the bicycles profit margin.

#change the format of the "Month" and "Year" variable
gbi$Month <- factor(gbi$Month)
gbi$Year <- factor(gbi$Year)

str(gbi$Month)
##  Factor w/ 12 levels "1","2","3","4",..: 1 1 1 1 1 1 1 1 1 1 ...
#Rename the months from number to letters
gbi$Month <- revalue(x = gbi$Month, c("1" = "January", '2' = "February", 
                 '3' = "March", 
                 '4' = "April", 
                 '5' = "May", 
                 '6' = "June", 
                 '7' = "July", 
                 '8' = "August", 
                 '9' = "September", 
                 '10' = "October", 
                 '11' = "November", 
                 '12' = "December"))
                
ggplot(data = gbi, aes(x = Profit.Margin.EUR)) +
  geom_histogram(binwidth = 100) + 
  geom_histogram(color = 'blue', fill = '#099DD9')  +
  facet_wrap(~Month) +
  ggtitle("Distribution of Profit Margins across Months") +
  xlab("Proft Margin in Euros")

The months of March and April have more material sale transactions. The spring months of the North Hemisphere seem to be most important for the bicycle business.

#This create the sum of all the transaction belonging to a sales order number
Profit.Margin <- tapply(gbi$Profit.Margin.EUR, gbi$Sales.Order.Number, sum)

#Create a new dataframe with the array created above
df <- data.frame(Profit.Margin)

#Histogram plot
ggplot(aes(x = Profit.Margin), data = df) +
  geom_histogram(binwidth = 5000) +
  ggtitle("Profit Margin per Sale Order Number ")

Purpose: to check if there is a difference in the distribution based on the sale order number. Outcome: Thit plot has similar shape as most of the plots shown before. The main difference is amount of the profit margin. This is because each sale order number has several material sale transactions.

Univariate Analysis

What is the structure of your dataset?

The dataset has 47,500 observations of sales data and 32 variables.

What is/are the main feature(s) of interest in your dataset?

A key variable of interest is the Profit Margin.

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

Other features in the dataset that will help support the investigation are: Division, Country, Customer, Year of transaction.

It is important to know:

Profit margin = Revenue - Cost of Good Sold (COGS)

The overarching question is to identify which factors contribute a high profit margin. The factors could be customers, materials, month, etc.

Did you create any new variables from existing variables in the dataset?

Yes, the new variable is the Profit Margin Percent.

Of the features you investigated, were there any unusual distributions?

Yes, there is a high number of low profit margins. When splitting the analysis into divisions it become apparent the low profit margin belong to the division “Accesories”. For the following analysis, the focus is on the other division (“Bicycles”).

Bivariate Plots Section

gbi.bicycles <- subset(gbi, Division.Description == "Bicycles")
gbi.accesories <- subset(gbi, Division.Description != "Bicycles")

This creates two dataframes where the division is filtered to Bicycles and Accesories. Moving forward, I will only analyse the Bicycles division.

There is a number of transactions without discounts. Discounts start around 1200 euros. Growing discount also means growing profit margin. Good news, discounts are only given when profit margins are adequate.

The higher the revenue the higher the profit margin. The countries show similar relationships of revenue and profit. There are three distinct slopes. I’ll run few more plots in order to determine the influence such slopes.

The three distinct slopes do not arise because of different material.

profit.margin.1 <- subset(gbi, Profit.Margin.pc == 1)

995 observations do not have COGS. Further investigation required!

The top slope is because of incomplete data of 2013. This data will be removed in the following dataframe.

gbi.bicycles1 <- subset(gbi.bicycles, Year != 2013)
gbi.accesories1 <- subset(gbi.accesories, Year != 2013)

Dataframes without 2013 year.

The two distinct slopes do not arise because of different customers. A binary variable will be created for discount in order to verify whether this influences the relatinship between revenue and profit margin.

#generate a dummy variable to determine if a discount was given
gbi.bicycles1$Discount.Given <- ifelse(gbi.bicycles1$Discount.EUR > 0, 1, 0)

gbi.bicycles1$Discount.Given <- factor(gbi.bicycles1$Discount.Given)

When discount is given “1”, these same levels of revenue translate into lower profit margin.
The relationship of revenue and profit margin is weaker when discounts are given.

This plot is similar to the one in the Univariate Plots Section, the difference is that now is filtered to Bicycles division and does not include 2013 because of incomplete data. As we can see the months of March and April have the higher peeks.

High price high profit margin. The price seem to be increasing over the years. Yet there is a big range of price for the same year of the same product. For example, 2012 there are prices of more than 3000 euros but also aproximately 2300 euros. The y-axis is profit margin per piece (not profit margin of the overall transaction). As expected there is a positive relationship between price per unit and the profit margin per unit.

The Women’s Off Road Bike has the lowest profit margin.

Knee pads has the largest profit margin of the Accesories division.

Different prices per unit have been charged. This could be a point for further investigation.

Different profit margin per unit have been realised. Men’s Off Road Bike has the lowest profit margin per piece. However, the overall profit margin is higher than the Women’s Off Road Bike.

Profit Margin started increasing from 2008 to 2012.

The same pattern as the previous plot happens with the Accesries division. This is just confirm whether the trend was the same for both divisions.

Price per Unit in Euros are more expensive in Germany than in United States across all finished bicycles

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

Profit margin has increased over the years. Material type influences profit margin. After observing the relationship between Revenue and Profit Margin. I had to investigate why there were those three distinc slopes. Multivaria variables plot had to be perfomed.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

There is an interesting relationship between prices per unit and Countries where GBI operates. The prices are higher in Germany than the United States.

What was the strongest relationship you found?

Revenue and profit margin has a strong relationship.

Multivariate Plots Section

There is a positive relationship between Revenue and Profit Margin (not surprising). Interestingly, with higher revenue values the positive relationship is weakening (smooth/thick blue line).

This could be because discount is given in bigger transactions.

There is a high variation of how much profit you make per revenue even within a material (See chart below). This could be because of other factors such as Year, Customer, Country.

Profit Margin has increased trend across all Bicycle products.

Profit Margin percent for most of the poducts is around 50%. This should be this high as there are fixed costs and other expenses that will decrease the profit margin further.

The plot shows the all finished bicycles have profit margins well above 40%.

This shows there is not a clear high profit margin customer of United States in any of the given years. Further, one can observe DC Bikes and Motown Bikes are not longer clients.

Same as previous plot, not a key customer of United States that provides high turnover.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

There is a positive relationship between Revenue and Profit Margin with a high variation of how much profit you make per revenue (per material). In the Bivariate analysis, we observed the price is different within one type of Bicycle. Plots of different customers and material did not show a clear exlanation for those variations (Revenue and Profit Margin).

Were there any interesting or surprising interactions between features?

Yes, the increase in profit margin percent across all bicycles material.


Final Plots and Summary

Plot One

Description One

This plot was analyse in both Univariate and Bivariate section. The months of March and April have more material (Bicycles) sale transactions. The spring months of the North Hemisphere seem to be most important for the bicycle business. I pick this plot because it is important for a business to know how seasonal the business is. This affect material management and cash management overall.

Plot Two

Description Two

This plot comes from Multivariate section. The profit margin is important for business to know. The plot shows the all finished bicycles have profit margins well above 40%. The good news is also that from 2006 to 2012 the trend has increased for all bicycles.

Plot Three

Description Three

This plot comes from Bivariate section. When discount is given “1”, these same levels of revenue translate into lower profit margin.
The relationship of revenue and profit margin is weaker when discounts are given. It is important for business to know how discounts influences the profit margin. In this case, discounts are given for relatively high revenue transactions. This generally make sense for a business to do in order to get higher turnover.


Reflection

The GBI dataset contains information on different sales transactions with a total 47,500 observations. For the purpose of this analysis I filtered the data to one of the material divistion sales transactions (“Bicycles”) from 2006 to 2012. This reduced the observations to 23,513.

The overarch goal was to identify strategies for the company in the future. Which region, customer, material, etc to focus on. Whether there are any downwards or upwards trends?

These were some of the original questions:

These questions were answered in the section “Final Plots and Summary”.

In conclusion, there were no clear customer that were more profitable than others. All customers contribute in the overall profit margin of GBI. Between Bicycles and Accesories Material Divisions, Bicycles were clearly the most profitable. However, it is important to note that in business like these it is important to provide a wide range of different product as support.

Germany has higher price per unit than United States. However, there are more sale transactions of USA customers.

Struggles/ Successes

While 33 variables seem like much, I was able to determine the main variables after doing a quick look at the names and overall data. This lead to determine the main key variable of interest “Profit Margin”.

Finding strong relationship between profit margin and other variables other than revenue were a struggle at first. Some plots have led me to dead end. However, it is important to mention that while dead ends can be time consuming and frustrating, they eventually lead to variables of interest, and help me narrow down the exploration criteria significantly.

I think the most frustrating part for me was to rename the labels of the first plot from the “Final Plots and Summary”" section. I looked for a way to rename them without changing the dataframe. I could not find a suitable solution so I decided to rename the values of the variable “Months” within the dataframe instead.